latent position
From which world is your graph
Cheng Li, Felix MF Wong, Zhenming Liu, Varun Kanade
Discovering statistical structure from links is a fundamental problem in the analysis of social networks. Choosing a misspecified model, or equivalently, an incorrect inference algorithm will result in an invalid analysis or even falsely uncover patterns that are in fact artifacts of the model. This work focuses on unifying two of the most widely used link-formation models: the stochastic blockmodel (SBM) and the small world (or latent space) model (SWM). Integrating techniques from kernel learning, spectral graph theory, and nonlinear dimensionality reduction, we develop the first statistically sound polynomial-time algorithm to discover latent patterns in sparse graphs for both models. When the network comes from an SBM, the algorithm outputs a block structure. When it is from an SWM, the algorithm outputs estimates of each node's latent position.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Communications > Networks (1.00)
- (2 more...)
Bayesian Distributional Models of Executive Functioning
Kasumba, Robert, Lu, Zeyu, Marticorena, Dom CP, Zhong, Mingyang, Beggs, Paul, Pahor, Anja, Ramani, Geetha, Goffney, Imani, Jaeggi, Susanne M, Seitz, Aaron R, Gardner, Jacob R, Barbour, Dennis L
This study uses controlled simulations with known ground-truth parameters to evaluate how Distributional Latent Variable Models (DLVM) and Bayesian Distributional Active LEarning (DALE) perform in comparison to conventional Independent Maximum Likelihood Estimation (IMLE). DLVM integrates observations across multiple executive function tasks and individuals, allowing parameter estimation even under sparse or incomplete data conditions. DLVM consistently outperformed IMLE, especially under with smaller amounts of data, and converges faster to highly accurate estimates of the true distributions. In a second set of analyses, DALE adaptively guided sampling to maximize information gain, outperforming random sampling and fixed test batteries, particularly within the first 80 trials. These findings establish the advantages of combining DLVM's cross-task inference with DALE's optimal adaptive sampling, providing a principled basis for more efficient cognitive assessments.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Testing for correlation between network structure and high-dimensional node covariates
Fuchs-Kreiss, Alexander, Levin, Keith
In many application domains, networks are observed with node-level features. In such settings, a common problem is to assess whether or not nodal covariates are correlated with the network structure itself. Here, we present four novel methods for addressing this problem. Two of these are based on a linear model relating node-level covariates to latent node-level variables that drive network structure. The other two are based on applying canonical correlation analysis to the node features and network structure, avoiding the linear modeling assumptions. We provide theoretical guarantees for all four methods when the observed network is generated according to a low-rank latent space model endowed with node-level covariates, which we allow to be high-dimensional. Our methods are computationally cheaper and require fewer modeling assumptions than previous approaches to network dependency testing. We demonstrate and compare the performance of our novel methods on both simulated and real-world data.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
- Health & Medicine > Therapeutic Area > Neurology (0.45)
- Information Technology > Data Science (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Matrix factorisation and the interpretation of geodesic distance
Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance.
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (7 more...)
- Information Technology (0.93)
- Health & Medicine > Therapeutic Area (0.46)
Weighted Random Dot Product Graphs
Marenco, Bernardo, Bermolen, Paola, Fiori, Marcelo, Larroca, Federico, Mateos, Gonzalo
Modeling of intricate relational patterns has become a cornerstone of contemporary statistical research and related data science fields. Networks, represented as graphs, offer a natural framework for this analysis. This paper extends the Random Dot Product Graph (RDPG) model to accommodate weighted graphs, markedly broadening the model's scope to scenarios where edges exhibit heterogeneous weight distributions. We propose a nonparametric weighted (W)RDPG model that assigns a sequence of latent positions to each node. Inner products of these nodal vectors specify the moments of their incident edge weights' distribution via moment-generating functions. In this way, and unlike prior art, the WRDPG can discriminate between weight distributions that share the same mean but differ in other higher-order moments. We derive statistical guarantees for an estimator of the nodal's latent positions adapted from the workhorse adjacency spectral embedding, establishing its consistency and asymptotic normality. We also contribute a generative framework that enables sampling of graphs that adhere to a (prescribed or data-fitted) WRDPG, facilitating, e.g., the analysis and testing of observed graph metrics using judicious reference distributions. The paper is organized to formalize the model's definition, the estimation (or nodal embedding) process and its guarantees, as well as the methodologies for generating weighted graphs, all complemented by illustrative and reproducible examples showcasing the WRDPG's effectiveness in various network analytic applications.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Central America (0.04)
- South America > Uruguay (0.04)
- (2 more...)
Signal Recovery from Random Dot-Product Graphs Under Local Differential Privacy
Vishwanath, Siddharth, Hehir, Jonathan
We consider the problem of recovering latent information from graphs under $\varepsilon$-edge local differential privacy where the presence of relationships/edges between two users/vertices remains confidential, even from the data curator. For the class of generalized random dot-product graphs, we show that a standard local differential privacy mechanism induces a specific geometric distortion in the latent positions. Leveraging this insight, we show that consistent recovery of the latent positions is achievable by appropriately adjusting the statistical inference procedure for the privatized graph. Furthermore, we prove that our procedure is nearly minimax-optimal under local edge differential privacy constraints. Lastly, we show that this framework allows for consistent recovery of geometric and topological information underlying the latent positions, as encoded in their persistence diagrams. Our results extend previous work from the private community detection literature to a substantially richer class of models and inferential tasks.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications (1.00)
- (2 more...)
Perfect Clustering in Nonuniform Hypergraphs
Chan, Ga-Ming Angus, Lubberts, Zachary
While there has been tremendous activity in the area of statistical network inference on graphs, hypergraphs have not enjoyed the same attention, on account of their relative complexity and the lack of tractable statistical models. We introduce a hyper-edge-centric model for analyzing hypergraphs, called the interaction hypergraph, which models natural sampling methods for hypergraphs in neuroscience and communication networks, and accommodates interactions involving different numbers of entities. We define latent embeddings for the interactions in such a network, and analyze their estimators. In particular, we show that a spectral estimate of the interaction latent positions can achieve perfect clustering once enough interactions are observed.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
From which world is your graph
Cheng Li, Felix MF Wong, Zhenming Liu, Varun Kanade
Discovering statistical structure from links is a fundamental problem in the analysis of social networks. Choosing a misspecified model, or equivalently, an incorrect inference algorithm will result in an invalid analysis or even falsely uncover patterns that are in fact artifacts of the model. This work focuses on unifying two of the most widely used link-formation models: the stochastic blockmodel (SBM) and the small world (or latent space) model (SWM). Integrating techniques from kernel learning, spectral graph theory, and nonlinear dimensionality reduction, we develop the first statistically sound polynomial-time algorithm to discover latent patterns in sparse graphs for both models. When the network comes from an SBM, the algorithm outputs a block structure. When it is from an SWM, the algorithm outputs estimates of each node's latent position.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Communications > Networks (1.00)
- (2 more...)
Learning Joint and Individual Structure in Network Data with Covariates
James, Carson, Yuan, Dongbang, Gaynanova, Irina, Arroyo, Jesús
Network data is ubiquitous in many disciplines and application domains, including computer science, statistics, biology, and physics. These data, encoding relationships between units represented as nodes, are often accompanied by additional information about the nodes, usually referred to as node covariates, attributes, or metadata (Newman and Clauset, 2016; Liu, 2019; Chunaev, 2020). In these situations, a common goal is to understand the associations between the network connectivity and the node covariates. In our example, we consider international food commodity trade data represented as a network, where the nodes correspond to different countries and edge weights encode food commodity trade volumes between corresponding countries. The covariates at each node consist of economic and geographic information for each country, such as gross domestic product (GDP) per capita, birth rate and region. We wish to exploit that both datasets contain information about the nodes in order to better understand the structure of the network, node covariates and their relationship. Specifically, we seek to understand how economic and geographic factors explain the observed trade between countries, and identify additional information in the network that cannot be explained solely by these variables. There has been substantial work that incorporates network and node covariate information. Some examples include methods that use node covariates to improve community detection (Binkiewicz et al., 2017; Huang et al., 2023), dimensionality reduction (Zhao et al., 2022), regression with network information (Li et al., 2019) and mixed effect models for network edges (Hoff, 2005).
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.14)
- Oceania (0.04)
- Africa (0.04)
- (66 more...)
- Banking & Finance (0.86)
- Government (0.67)
- Health & Medicine > Therapeutic Area (0.67)
- (2 more...)
Gaussian Embedding of Temporal Networks
Romero, Raphaël, Lijffijt, Jefrey, Rastelli, Riccardo, Corneli, Marco, De Bie, Tijl
Representing the nodes of continuous-time temporal graphs in a low-dimensional latent space has wide-ranging applications, from prediction to visualization. Yet, analyzing continuous-time relational data with timestamped interactions introduces unique challenges due to its sparsity. Merely embedding nodes as trajectories in the latent space overlooks this sparsity, emphasizing the need to quantify uncertainty around the latent positions. In this paper, we propose TGNE (\textbf{T}emporal \textbf{G}aussian \textbf{N}etwork \textbf{E}mbedding), an innovative method that bridges two distinct strands of literature: the statistical analysis of networks via Latent Space Models (LSM)\cite{Hoff2002} and temporal graph machine learning. TGNE embeds nodes as piece-wise linear trajectories of Gaussian distributions in the latent space, capturing both structural information and uncertainty around the trajectories. We evaluate TGNE's effectiveness in reconstructing the original graph and modelling uncertainty. The results demonstrate that TGNE generates competitive time-varying embedding locations compared to common baselines for reconstructing unobserved edge interactions based on observed edges. Furthermore, the uncertainty estimates align with the time-varying degree distribution in the network, providing valuable insights into the temporal dynamics of the graph. To facilitate reproducibility, we provide an open-source implementation of TGNE at \url{https://github.com/aida-ugent/tgne}.
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- North America > United States > California (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)